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Abstract 

Asymptotic factorizations for the small-ball probability (SmBP) of a Hilbert valued 
random element X are rigorously established and discussed. In particular, given the 
first d principal components (PCs) and as the radius e of the ball tends to zero, the 
SmBP is asymptotically proportional to (a) the joint density of the first d PCs, (b) the 
volume of the d-dimensional ball with radius e, and (c) a correction factor weighting 
the use of a truncated version of the process expansion. Moreover, under suitable 
assumptions on the spectrum of the covariance operator of X and as d diverges to 
infinity when e vanishes, some simplifications occur. In particular, the SmBP factorizes 
asymptotically as the product of the joint density of the first d PCs and a pure volume 
parameter. All the provided factorizations allow to define a surrogate intensity of the 
SmBP that, in some cases, leads to a genuine intensity. To operationalize the stated 
results, a non-parametric estimator for the surrogate intensity is introduced and it is 
proved that the use of estimated PCs, instead of the true ones, does not affect the rate 
of convergence. Finally, as an illustration, simulations in controlled frameworks are 
provided. 

Keywords. Hilbert functional data; Small Ball Probability; Karhunen-Loeve decom¬ 
position; kernel density estimate. 


Introduction 


For a random element X valued in a general metric space, the measure of how it concen¬ 
trates over such space plays a central role in statistical analysis. If X is a real random 
vector, its joint density is, in a natural way, that measure. In fact, in practical situa¬ 
tions, the density is helpful in defining mixture models, in detecting latent structure, in 
discriminant analysis, in robust statistics to identify outliers and so on. When observed 
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space to which the data belong raises problems in defining an object that plays the 
same role of the joint density distribution in the multivariate context. The main prob¬ 
lem is that, without an underlying dominant probability measure, the Radon-Nikodym 
derivative can not be straightforwardly applied. To manage this, a concept of “surro¬ 
gate density” is derived from the notion of small-ball probability (SmBP in the sequel) 
of a random element X, that is 


ip{x,£) = P(A(X,x) < e ), 


where x is in the same space where X takes its values, e is a positive real number and A 
is a semi-metric. The behaviour, as e vanishes, of (/? (x, e) provides information about 
the way in which X concentrates at x. Such feature stimulates the study of the SmBP 
in different settings; the limiting behaviour has been developed from a theoretical 
point of view ffor i nstance, refer to the small tails/deviations theory Li and Shaol . 


2001 


Lifshitsl. I2OI2I and references therein), in functional statistics the SmBP was 
used to derive asymptotics in mode e stima t ions (see, e.g . IPabo-Niang et ah . 2007 : 


Delaigle and Hall . 2010 ; Ferratv et al.l . 2012 : Gasser et ah . 19981 ). as well as in non 


parametr ic regression literature in evaluating the ra te of convergence of estimators 
(see, e.g. Ferratv and Vieu . 20061 : Ferratv et ah . 2007). 

Often, the necessity to h ave available a. surrogate density for X h as brought to assume 


(as done, for instance, in iFerratv et al.l . |201 


ogate density to r A n as ore 
J Gasser et al. , 1998li that 


¥?(x, e) = T (x) (!>{£)+ 0 {(j) (e)), 


( 1 ) 


where T, depending only on the center x, plays the role of the surrogate density of 
the random element X, whilst 4> (e) is a kind of “volume parameter” which does not 
depend on the spatial term. It is worth noticing that T is also the intensity of the 
SmBP. 

Although to break the dependence on x and s supplies a clear modelling advantage and 
the existence of T (x) is desirable (especially from a statistical perspective), factoriza¬ 
tion © can be derived only in particular settings. Notab le examples are the case of 
Gaussian processes (e.g. Li and Shaol . 200 ll : Lifshits . 2012 an d references therein) and 
the one of fractal processes for suitable semi-norms A (e.g. Ferratv and Vieu . 20061 . 
Chapter 13). Hence, a crucial task is to study some asymptotic factorizations of the 
SmBP that lead to a definition of its intensity or, at least, when it is not possible to 
completely isolate the dependence on x and e, a surrogate intensity. 

In what follows, we assume A to be a random element in an Hilbert space with A 
being the metric induced by the Hilbert norm and, without loss of generality, we deal 
with random curves on the space of square integrable functions on the unit interval. A 
fir st factorization of the Sm BP that allows to define a surrogate intensity was provided 
by Delaigle and Hall ( 201(11 ): besides some technical hypothesis, on the spectrum of the 
covariance operator of X and assuming that principal components of X are indepen¬ 
dent with positive and sufficiently smooth marginal density functions {fj}, the authors 
showed that 

V9(x,e) ~ JJ/j (xj)(()(e,(i), e ^ 0, (2) 

j<d 
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where xj is the projection of x over the j-ih. principal axis, (j){e, d) is volumetric term and 
d = d{e) is the number of considered terms of the decomposition diverging to infinity 
as e vanishes. Now, from the application point of view, the independence assumption 
appears quite restrictive and the spatial factor OjXd fj results just a surrogate intensity 
of the SmBP because of the dependence between d and e. Moreover, one may wonder 
if the principal component analysis is necessary to obtain the factorization. 

The hrst part of the present work is devoted to propose more general factorizations 
for the SmBP that relax the hypothesis of independence, and to identify situations in 
which it is possible to obtain a genuine intensity. The hrst stated factorization holds 
for any positive integer d: 

^p{x,e) ~ fd{xi,... ,Xd)Vd{e)Ti{x,e,d) , as e 0, 

where fd is the joint distribution of the hrst d principal components, Vrf(e) is the volume 
of a d-dimensional ball with radius e and TZ {x, e, d) £ (0,1] denotes an extra factor 
compensating the use of (xi,..., Xd) instead of x. Such factorization benehts from the 
fact that d is hxed but, in general, a genuine intensity can not be dehned because 
TZ depends on both x and e. Such dependency is bypassed and hence an intensity 
obtained if one introduces suitable assumptions on the probability law of the process 
and/or on the point x at which the factorization is evaluated. 

Moving from this hrst factorization, we prove that 

ip[x,e) fd{xi,... ,Xd)4>{s,d), as e ^ 0, and d(e) —)• oo, 

where (j){s,d) is a volume parameter that depends on the decay rate of {Aj}, the 
eigenvalues of the covariance operator of X. Such result canditates, in a very natural 
way, the joint density distribution to be a surrogate intensity of the SmBP and, 
under suitable assumptions on X, allows to dehne an intensity. Furthermore, it turns 
out that, the hrst factorization is basis free, while for the second one the principal 
components basis is optimal in some sense. 

In the second part of the paper, in order to make available the surrogate intensity 
of the SmBP for statistical purpose, we propose a multivariate kernel density approach 
to estimate fd. Under general conditions, we prove that, although the estimation 
procedure involves the estimated principal components instead of the true ones, the 
estimator achieves the classical non-parametric rate of convergence. To show how such 
estimator performs on hnite sample frameworks, we study its behaviour by means of 
simulated processes whose intensity is known. 

The paper outline goes as follow: Section [1] introduces the framework. Section [2] 
considers the factorization of the SmBP when d is hxed whereas Section [3] when d 
diverges to inhnity as e vanishes. Section 0] provides the statistical asymptotic theorem 
in estimating the joint density fd. Section [5] illustrates some numerical examples. 
Finally Section [6] collects all the proofs. 

1 Preliminaries 

Let (fl, X, P) be a probability space and be the Hilbert space of square integrable 
real functions on [0,1] endowed with the inner product {g,h) = g (t) h (t) dt and the 
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induced norm = {§,§)■ Consider a measurable map X defined on taking 

values in (£p i]!'®)) where B denotes the Borel sigma-algebra induced by || • ||. Denote 
by 


/ix = {]E[X(t)],tG [0,1]}, and S [•] = E [(X -/i^, •) (^ - , 


the mean function and covariance operator of X respectivef 
Karhunen-Loeve expansion associated to X (see e.g. lBosal . l200 


I 


Let us consider the 
denoting by 

the decreasing to zero sequence of non-negative eigenvalues and their associated or¬ 
thonormal eigenfunctions of the covariance operator S, the random curve X admits 
the following representation 


^ (t) = /rx (t) + ^ (t), 0<t<l, 

i=i 

where = (X — nx,Cj) are the so-called principal components (PCs in the sequel) of 
X satisfying 


E [Bj] = 0, Var {6,) = A,-, E [OjOj,] =0, f. 

It is just the case to recall that provides an orthonormal basis of the con¬ 

sidered Hilbert space and that the Karhunen-Loeve expansion, taking advantage of 
the euclidean underling structure, isolates the manner in which the random function 
X{uj,t) depends upon t and upon w. 

In order to achieve our aims, let us consider the following assumptions. 

(A-1) The process is centered, that is fix = 0. 

(A-2) The center of the ball x £ Tp is sufficiently close to the process in its high- 
frequency part, that is 

x'j < CiXj, for any j > 1 (3) 

where Xj = {x,^j) for some positive constant Ci. 

The latter is not a restrictive condition since it holds whenever x belongs to the repro¬ 
ducing kernel Hilbert space generated by the process X: 


RKHS{X) = 


I>i 


(4) 


Roughly speaking, x is a n element of RKHS(X) only i f it is 
covariance function 


see 


at least smooth as the 
Berlinet and Thomas-Agnanl . l2004l . p. 13 and p. 69. Note 


that RKHS{X) is a very large subspace of H including the finite dimensional ones; in 


fact, if Xj = 0 for any j > d and any d G N, then x G RKHS{X). Furthermore, (A-2) 
is not unusual sinc e it is equivalent to sup^^ i E [(0j — Xj)^/Aj] < oo that was used, for 
similar purpose bv lDelaigle and Halil . 120101 . Condition (4.1). 


4 
















(A-3) Denote by the projector onto the d-dimensional space spanned by 

The first d PCs, namely 6 = = ( 6 * 1 ,... ,0^)', admit a joint strictly positive 

probability density, namely e-)- Moreover, is twice differentiable 

at = (di,..., dd)' G and there exists a positive constant C 2 (not depending 
on d) for which 


d^fd 

ddiddj 


{d) 


< 


C 2 


fd{xi,...,Xd), 


(5) 


for any d G N, z, j G {1,..., d} and d G = |i9 G : Ylj<d 
for some p > e. 


From now on, with a slight abuse of notation and when it is clear from the context, 
fd{x) denotes fd (xi,..., Xd). 

To better appreciate the meaning of ([5]) , note that it can be derived in an intuitive way 
considering = (TTi,..., WdY, the deterministic translation of the component-wise 
standardized version of the PCs defined by 


W^ = 


1 

\Ai 


{x-x,ij) 


- {x,Cj) 

\Ai 


In fact, ([5|) is equivalent in assuming the boundedness of the second derivative of the 
density probability function of the random vector W^. Since the latter is a linear 
transformation of 6, condition ([5|) is equivalent to 


d‘^9d 

dwidwj 



<c29m, 


for any d G N, z, j G {1,...,d} and w G -D' = |w G : Ylj<d'^‘j^j — P^'\ some 
yO > e. It is worth noting that (A-3) is not restrictive: it includes, for instance, the case 


of Gaussian Hilbert valued processes. 


2 Approximation results for a given d 


To state the main result of this section, let us consider a finite positive integer d, a 
given point x G and define 

S = S{x,e,d) = d- ^ 7^(x,e,d)=E (1 - , ( 6 ) 


j>d+l 


and 


Vd{e) = 


^d^dl2 


r(d/2 + I)’ 

that is the volume of the d-dimensional ball with radius e. 

Theorem 1 Let X he a process as above, (p(x,s} be the small ball probabilities of X, 
assume 


(A-1). ..., (A-3) and define 


ifdix, e) = fd(x)Vd(e)7Z (x, e, d), for e > 0. 


(7) 
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Then 


for e > 0 


( 8 ) 


that is 


\(p{x,s) - ^d{x,e)\ < C 2 —^Pd{x,s), 


ip{x, e) ~ fd{x)Vd{e)n {x, e, d) , for e 0. 


(9) 


In other words, for a fixed positive integer d and as e —?> 0, above theorem states 
that the SmBP ip{x,e) behaves as ipd{x,e) (the usual first order approximation of the 
SmBP in a d-dimensional space fd{x)Vd{s)) up to the scale factor TZ {x,s,d). The 
latter, depending on x only through its high-frequency components {xj}j>d+i, can be 
interpreted as a corrective factor compensating the use of a truncated version of the 
process expansion. Note that changing d affects all the terms in the factorization but 
not the asymptotic ([9]). 

Because of Tl{x,e,d), the dependence on x and s can not be isolated in ([9]) and 
hence an intensity of the SmBP is not, in general, available. On the other hand, there 
exist some specific situations in which a genuine intensity can be defined from above 
factorization. We have identified three of these, namely: 


a) TZ{x, e, d) is independent on x, 


b) there exists a finite positive integer do such that, for any d > do, Tl{x,e,d) = 1, 


c) for any x, 


TZ{x, e, d) 1, 
(p{x,e) ~ fd{x)Vd{e), 


E —)■ 0, di^E^ —y oo. 


In the following, we discuss points a) and b), whereas point c) is discussed in 
Section [3] since requires additional arguments. In the last discussion we spend few 
words about the consequences of choosing a basis different from the Karhunen-Loeve 
one. 

D.l — TZ{x, E,d) is independent on x. Consider, for instance, Xj = 0 for any 
j > do + 1 (i.e. X belongs to the space spanned by {Ci; ■ ■ ■) Cdoi)- Then applying 
Theorem [H for any d > do, we have 


(/?(x,e) ~/d(x)I4(e)7^(e,d), as e-)► 0, 

where Vd{E)TZ{E,d) represents a pure volumetric term while fd is an intensity of the 
SmBP evaluated at x. Let us consider the remarkable Gaussian process for which 


99(x,e) ~exp^ 


I ^xj[ V^{E)n{E,d) 
n7'<d V^vrAj 


i<d 


= Td(x) • Vd(e), as e 


where, for any d > do, 'l'rf(x) = '^doix) = exp xj/(2Aj)| is the intensity 

of the SmBP evaluated at x. If we further specialize the above case to the Wiener 
process on [0 ,1], we can show th at 'l'rfp(x) is coher ent with already kno wn results (see, 
for instance, Li and Shao . 2001 . Theorem 3.1 and Dereich et ah . 20031 . Example 5.1). 
In fact, the Karhunen-Loeve decomposition of a Wiener process is known to be 


w{t) = Y,Zjij{t), 
i=i 


with 


t G [0,1], 

Zj~A^(0 ,1) i.i.d., 

(t) = V2sin((j - 0.5) TTt) /y/Xj, 
Aj = (j - 0.5)"V-2, 
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and it is known that 


^p{x, s) ~ exp 



4e 

0Fexp{^}’ 


e ^ 0, 


where x(t) is sufficiently smooth. Thus, the latter must be equivalent to • V(i(e) 

as e goes to zero and for any d > do. Since we are interested in the definition of an 
intensity, we do not care about the volumetric part and we just compare the spatial 
parts. Given x {t) = J2j=i (^) where bj € M, then Xj equals bj, for j = 1,... ,do 
and is null otherwise. Moreover, straightforward computations lead to 


exp 




exp 



^do(a:)- 


The special case of Wiener process is exploited in Section 15.21 to construct numerical 
examples in estimating the intensity 

D.2- The case TZ{x,e,d) = 1. Consider X a do-dimensional process (that is when it 
takes values in a do-dimensional subspace of the Hilbert space). Then Xj = 0 for any 
j > do + 1, (A-2) leads to xj = 6j = 0 and TZ{x, s, j) = 1, for any j > do + 1. Moreover, 
Theorem [T] can be applied only for d < do because fdo+i is not strictly positive and 
hence (A-3) fails. Consequently, ip{x,e) ~ fdg{x)Vdg{e) that is the usual first order 
approximation of the do~dimensional process and is the intensity of the SmBP of 
the process. 

D.3 — Changing the basis. Let be an orthonormal basis of the Hilbert 

space arranged so that the sequence Var{{X,^j)) = Xj is sorted in descending order. 
Then Theorem [T] still holds. 


3 Approximation results when d depends on £ 

The goal of this section is to establish which conditions on X allow to simplify ([9]), to 
get 

‘Pix, e) ~ fd{x)Vd{e), as e ^ 0. 

In what follows, such result is achieved combining Theorem [T] and the limit behaviour 
of TZ{x,e,d) which is strictly related to that of the real random variable S{x,e,d) 
defined by Equation ([6]). On the one hand, whether d and x are fixed, S diverges with 
e tending to zero. On the other hand, if e and x are fixed, the larger d the smaller S. 
Hence, one may wonder if it is possible to balance these two effects (as instance, tying 
the behaviour of d to that of e) in order to have, for any x, 


f 7Z(x, e, d) 1, 

I p{x,e) ~ fd{x)Vd{£), 


E —)* 0, d{E^ —oo. 


( 10 ) 


To do this let us consider the following limit behaviour of TZ, as e goes to zero and d 
diverges to infinity. 


7 





Proposition 2 Assume (A-2) and suppose that Aj = o{l/d), as d goes to 

infinity. Then it is possible to choose d = d{e) so that it diverges to infinity as e tends 
to zero and 

j>d+l 

Moreover, as e ^ 0, 


0 < 1 — TZ{x, e, d) < 


Cfid + 2) 

2e2 


Y1 = °( 1 )' 

j>d+l 


Let us now consider the following inequality 


( 12 ) 


\t{x,£) - fd{x)Vd{e)\ < |(/?(x,e) -ipd{x,s)\ + \ipd{x,e) - fd (x) Vd{e)\, 


that, thanks to (ED, dH and to the fact that 0 <TZ < 1, leads to 


Tix,s) 


fd{x) V'd(e) 


- 1 


< 6*2— 7 ?.(x,e,(i) + \n{x,e,d) - 1| , 

2Ad 

^ C I C'i(^ + 2) 




(13) 


j>d+l 


Thus, the wished result holds whenever the right-hand side vanishes as e goes to zero, 
i.e. if there exists d = d{£) such that, at the same time the following two conditions 
hold 

£^ = o{Xd), and (d + 2) ^ Aj = o(e^). (14) 

j>d+l 

In order to obtain dloD, we combine conditions in (|14p (plug the hrst in the second), 
and we get that eigenvalues must satisfy the hyper-exponential decay rate defined by 


Xd 


o(l), 


as d —)• oo. 


(15) 


The latter highlights a trade-off between the approximation errors provided by Theo¬ 
rem [1] and Proposition [2j This trade-off is strictly related to a suitable balance between 
the d-th eigenvalue and the terms in the tail of the spectrum of the covariance operator. 

It is worth noting that hyper-exponential decay of eigenvalues is a necessary con¬ 
dition to guarantee that the right-hand side of (II3p vanishes. One may wonder, if it is 
sufficient as well, that is in other words, if it is possible to dehne d = d(e) so that the 
errors in (I14p vanish at the same time as e goes to zero. A positive answer to this is 
furnished by the following. 


Theorem 3 Consider hypothesis of Theorem [I] and assume that eigenvalues decay 
hyper-exponentially. It is possible to choose d = d(e) so that, if e ^ 0, then d ^ oo 
and 


(p{x,s) = fd {x) Vd{e) + o{fd (x) Vd{£)). 


(16) 









In what follows, in order to discuss assumptions and consequences of above result, 
some issues are developed. 

D.4 Again about the intensity of the SmBP. Because of the relation between 
d and e, in general, (|16p does not allow to define an intensity as commonly intended: 
anyway, being the only term depending on x, it can be considered as a surrogate 
intensity. 

The Gaussian processes or their suitable generalizations, provide examples for which 
fd leads to define a genuine intensity. At first, consider a Gaussian process X, then 
for any x G and as e goes to zero, e) ~ 'l'rf(x) • Vrf(e); see ID. 11 When d tends 

to infinity, then Trf(x) tends to T(x) = exp |— x^/(2Aj)| for any x G and 

T(x) plays the role of the intensity of the small-ball probability at x. Note that, 'I'(x) 
is not null if and only if x belongs to RKHS{X), see (jH). In particular, if we consider 
the Wiener process on [0,1], it can be proven, with arguments analogous to those in 
ID. II and for any smooth real function x on [0,1], that 


T(x) = exp|-i^ 


The latter is coherent with already known results; see, for instance, iLi and Shad . 12001 
Theorem 3.1. 

Another situation in which an intensity for the SmBP can be also defined occurs 
whether the PCs are independent each one with density belonging to a subfamily of 
the e xponential power (or generalized normal) distribution (see e.g. iBox and Tiaol . 
I 973 I I. that is proportional to exp{— {}xj\/, with q > 2. In this case, T is 


given by 


1 °° 

T(x) =exp<^ 





for any x G 


and, it is not null if x is in ff(g) = |x G CJq \f^Y < oo| that includes 

RKHS{X) when q>2. 

D.5 — An example of hyper—exponential decay. Consider Xj = exp{—/3j"} with 
/? > 0 and a > 1. In this case, for any real number n > 1, it holds 


dE 


j>d+lE 


Ad 


< 




j>d+l Aj 


Ad 


0 , 


as d 


00. 


(17) 


In fact, some algebra and the Bernoulli inequality give 




j>d+l Aj 


Ad 


= d 



exp{/3d“(l-(l+j/d)“)} 


< d^ 


( ^exp{-/3ad“-ij} 

Vi>i 


Since exp{—^j} < {Ed'^~^^) ^ eventually (with respect to d) holds for some 
positive 5 and for each j G N, (fT71) is obtained. 
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3.1 Weakening the eigenvalues decay rate 

If on the one hand, the factorization (1161) provides the exact form of volumetric part 
(that is Vrf(e), the volume of the d-dimensional ball of radius e), on the other hand, 
it is obtained at the cost of the hyper-exponential eigenvalues decay (flSj) . Weakening 
the eigenvalues decay rate, the exact form of the volumetric part no longer appears. 
In particular, we focus our interest on the following two further behaviours of the 
spectrum of S: 

• “super-exponential”: 

Xj = o (1), as d ^ oo. 

j>d+l 

or, equivalently, 

jX(^ —^ 0 , as d —^ oo. (1^) 

• “exponential”: there exists a positive constant C so that 

A^^ ^ Xj < C, for any d G N. (19) 

j>d+l 

It is possible to show that (fTKl) ^ (flH]) ^ (fT^ but the contraries do not hold. For 
instance, for any a > 1 and (3 > 0, Xj = exp{— /Ij} decays exponentially but not 
super-exponentially, Xj = exp {—/3j In (In (j))} decays super-exponentially but not 
hyper-exponentially while Xj = exp{— /3j"} decays hyper-exponentially. 

The following Theorem holds. 

Theorem 4 Under hypothesis of TheoremUl as £ tends to zero, it is possible to choose 
d = d{£) diverging to infinity so that ip{x, e) fd{x)4>{£,d), where 

• in the super-exponential ease 

(j){£, d) = exp I irf [log(27ree^) - log(d) + o(l)] | , 

• in the exponential case 

(p{£, d) = exp I [log(27ree^) - log(d) + 5{d, a)] | , 

where is sueh that lima^oo hmsupg^go (I(s, a) = 0 and a is a parameter 

ehosen so that Xf^e'^ < o?. 

In other words, if the decay rate changes also the volume factor does. In particular, 
fd{x) preserves the role of a surrogate intensity whereas 4>{£,d) substitutes Vd^s) as 
volumetric term in the factorization. Observe that 0(e, d) depends on terms (namely, 
o(l) and 6{s, a)) that are implicitly dehned and for which we just know the asymptotic 
behaviour. It is just the case to note that, in the exponential setting. Discussion ID.41 
about Gaussian and exponential power processes still holds with minor modifications. 
D.6 — About slower eigenvalues decay rates. This theoretical problem is partially 
still open. In fact, a part from the Gaussian processes and, in particular, the Wiener 
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one (whose eigenvalues decay arithmetically but the intensity, evaluated at smooth 
X, can be defined as illustrated in ID.ip . to the best of our knowledge, there are no 
other attempts to provide asymptotic factorizations for the SmBP of processes whose 
eigenvalues decay slower than exponentially. Hence, at this stage, if no information 
about the probability law are available a solution is to go back to Theorem [1] in order 
to manage the dependence on x and e in TZ{x,e,d). 

D.7 — Optimal basis. Although the factorization results in theorems [3] and 0] are 
stated by using the Karhunen-Loeve (or PCA) basis, they hold for any orthonormal 
basis ordered according to the decreasing values of the variances of the projections, 
provided that they decay sufficiently fast. In particular, using the same notations as 
in ID.31 if the sequence has an exponential decay then Theorem 0] still holds 

and a surrogate intensity can be defined. Note that the variances obtained when one 
uses the PCA basis exhibit, by construction, the faster decay; in this sense the choice 
of this basis can be considered optimal. 


4 Estimation of the surrogate intensity 


Besides their theoretical interest, theorems[Il[3]and0]turn to be useful from applications 
point of view as well. In fact, under suitable assumptions, they theoretically justify the 
use of fd as a surrogate intensity for Hilbert-valued proce sses in statistical appl i cation s 
as done, for instance, within classification problems by Bongiorno and Goia ( 2016l l. 
This fact leads immediately to the main task of this section: to make the factorization 
results usable for practical purposes and, in particular, to introduce an estimator of 
the surrogate intensity fd- 

Consider a sample of random curves {Xi, i = 1,... ,n} which we suppose i.i.d. as X. 
In principle, if the sequence of eigenvalues was known, one should consider the 

empirical version of the vector of the first d principal components 6i = {On,... ,6di)' S 

with 6ji = {Xi — E [Xi] , ^j), and then introduce the classical kernel density estimate 
of fd as follows: 


1 

fd,n (n,x) = fn (x) = -Y^Kh„ (||n, (A, - X)||) (20) 

1=1 

where Kh„ (u) = det K , A is a kernel function and, Hn = H^d is a 

symmetric semi-dehnite positive dx d matrix (with an abuse of notations, we dropped 
the dependence on d). In practice, the equation (f20]l defines only a pseudo-estimate 
for fd'- indeed, the covariance operator S and then the sequence {^j} are unknown. 
Thus, to operationalize these pseudo-estimates it is necessary to consider the estimates 
6i and H^ of 9i and H^ respectively. In this view, consider the sample versions of 
and S, respectively: 

^ n 1 ^ 

(t) = -y,X,{t), and S„[-] = - V(A, - •)(Ai - A„). 
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The 


^ 'I oo ^ 

eigenelements of S„ provide an estimation for of S, 

L J j=i J 


as 


well as {Xi — Xn, ^j) = Oji estima tes (the asymptotic behaviour of these estimators 
has been widely studied; see e.g. Bosql . 20001 ) . Thus, plugging these estimates (or the 
estimate of the eigen-projectors) in (1201) . we get the kernel density estimator: 


fd,n j = fn{x) = Kh^ firf {Xi - x) 


2=1 


n^x G 


( 21 ) 


Since Ai > A2 > • • • > An > 0 = An+i = ... one could choose d = n, but in 
practice this is not an appropriate choice: the curse of dimensionality problems in 
estimate a multivariate density combined with a bad estimation of the PCs associated 
to the smallest eigenelements would jeopardize the quality of estimation. Hence, a 
suitable dimension d <C n had to be identified. A first naive way consists in selecting 
the smallest d for which the Fraction of Explained Variance (defined as FEV(d) = 
exceeds a fixed threshold. Anyway the problem of selecting the 
dimension d to be used in practice is still open and needs further developments that 
go beyond the scope of this paper. 

If, from a computational point of view, the replacement of H^ with H^ in (I21h is 

a natural way to manage the problem of estimating the surrogate density in practice, 

one may wonder if that plug-in can influence the rate of convergence of the kernel 

estimator, or, in other words, if using /„ instead of fn has no effect on this rate. 

1 2 


fd (a:) - fn {x) 


as n goes to 


To answer this question, we study the behaviour of E 

infinity. For the sake of simplicity, we confine the study to the special case Hn = h'^I 
where I is the identity matrix, we assumed d fixed and independent of the observed 
data, and we suppose that 

(B-1) the density fd (x) is positive and p times differentiable at x G with p > 2; 
(B-2) the sequence of bandwidths hn is such that: 


hn 


0 


and 


logn 


00 


as n 


00; 


(B-3) the kernel it' is a Lipschitz, bounded, integrable density function with compact 
support [0,1]; 

(B-4) the process X satisfies the following condition: there exists two positive constants 
s and K such that for all integer m >2, 

E[||X-x|r] < 


The hypothesis (B-1), (B-2) and (B-3) are standard in the non-parametric frame¬ 


work, and p > 2 is required because of (A-3), Moreover, condition (B-4) holds for a 


wide family of processes including the Gaussian one. 
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Observe firstly that one can control the quadratic mean under study by intercalating 
the pseudo-estimator (|2T]) : in fact, thanks to the triangle inequality 


E 


fd (x) - fn (x) < E [fd (x) - fn {x)] fn (x) - fn (x) 


1 2 


( 22 ) 


About the first term on the right-hand side of (1221) . it is well known in the literature 
(see for instance Wand and Jone^ . 19951 ) that under assumptions (B-1) 
taking the optimal bandwidth 

c^n-i/(2p+d) <hn< C2n-^/(2p+d) 


(B-4), and 
(23) 


where ci and C2 are two positive constants, one gets the minimax rate: 

E [/d (x) -/„ (x)f = O 


uniformly in Therefore, it is enough to control the second addend on the right- 
hand side of ([2^ . 

The following theorem states that, assuming a suitable degree of regularity for the 
density fd depending on d, the rate of convergence in quadratic mean of fn (x) towards 
fn (x) is negligible with respect to the one of fn (x) towards fd (x). Thus, to use the 
estimated principal components instead of the empirical ones does not affect the rate 
of convergence. 


Theorem 5 Assume (B-1) 
bandwidth 


(B-4) with p > 2 V 3(i/2 and consider the optimal 


Thus, as n goes to infinity, 


E 


fn (x) - fn (x) 


1 2 


= o \ n 


,-2p/{2p+d)'^ 


uniformly in 

Formulation (j21j) requires that each random curve W (t) is observed entirely in the 
continuum and without noise over [0,1]. In practice, a discretization is inevitable as 
the curves are available only at discrete design points {Ti^i,Ti^ 2 , ■ ■ ■ Tij G [0,1], 

that are not necessarily the same for each i. Thus, it is necessary to introduce some 
numerical approximation to compute the estimates of the d principal components in¬ 
volved. 

When each curve is observed without errors over the same fixed equispaced grid 
{ti = 0, T 2, ..., Tp-i,Tp = 1}, with p sufficiently large, then one can replace simply inte¬ 
grals by summations: the empirical covariance operator is approximat e by a matrix and 
its eig enelements are computed by standard numerical algorithms (see IR ice and Silvermanl . 
I1991I I. This is the approach we follow in the simulations in Section [5] below. 

A more general situation occurs when observed data are discretely sampled and cor¬ 
rupted by noise. In this case, one has the observed pairs ,i = 1,... ,n,j = 1,... ,pi}, 

where Yij = Xi (rij) -)- Sij and the errors £ij are i.i.d. with zero mean and finite vari¬ 
ance. If each Pi > Mn, where is a suitable sequence tending to infinity with n 
(we refer to this case as dense functional data), a presmoothing process is run before 
to conduct PC A using th e sample rn e an an d covariance computed form the smoothed 
curves (see, for instance. Hall et ah . 20061 ). In this case, under suitable hypothesis, 
the estimators of eigenelements are root-n consistent an d first-order equ ivalent to the 
estimators obtained if curves were directly observed (see Hall et ah . 20061 . Theorem 3). 
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5 Finite sample performances in estimating the 
surrogate density 


We illustrate now, through numerical examples, the feasibility of SmBP factorization 
approach by exploring how the proposed estimator works in a finite sample setting. We 
consider only two situations because of the difficulty in finding explicit expressions for 
the intensity. At first, we focus on a finite dimensional process for which the surrogate 
density is straightforward derived. After, we deal with the Wiener process that is one 
of the few infinite dimensional processes whose intensity can be derived, as already 
illustrated. In both cases, we study how the estimates behave varying the sample size 
and d. All simulations rest on the density estimator defined in (I2ip , and are performed 
on a suitable grid of the d-dimensional factor space: the algorithms ar e implemented 
in R, and exploit the function kde in the package ks (see iDuond . 120071 1 . 


5.1 Finite dimensional setting 

Consider a one-dimensional random process whose trajectories are defined by 

A (t) = ay^ sin (t), t G [0, tt] , 

where a is a random variable with zero mean, unitary variance, density fa and cumu¬ 
lative distribution function Fa- Given x (t) = b\/2j7r sin (t) with 6 G M, then, for any 
e > 0, ip (x, e) = Fa {h + e) — Fa (6 — e), and, as e goes to zero, p {x, e) ~ 2sfa (b). Such 
asymptotic is the same obtained from the SmBP factorization: since the first PC is 
9 = a and xi = 6, it holds 

p (x, s) ~ /i (xi) eTT^/^/r (1/2 -M) = 2fa {b) e, e 0, 


with fa being the intensity of the SmBP. 

In this framework, fa is compared with its estimates from a sample of curves, 
for different x (t), varying the nature of a and the sample size. In practice, set n and 
the distribution of a, we generated 1000 samples {Xi (t) ,i = 1,... ,n}, i.i.d. as X (t), 
(with n = 50,100,200, 500,1000) where every curve is discretized over a mesh consist¬ 
ing on 100 equispaced points {tj = (j — 1) 7r/99, j = 1,..., 100}. For each sample, we 
estimated the eigenfunction ^ (t), the associated PC 0 and its density via kernel proce¬ 
dure. Besides such samples, we built a set of curves x^ (t) = 6y2/7rsin (t) (discretized 
on the same grid as X (t)), where 6 is a suitable increasing sequence of real values. The 


estimated density is then evaluated at the points x^ = yx^ (t), ^ (t)y and compared 
with the true values fa (b) in term of relative mean square prediction error (RMSEP 


= E. 

possib 


fl,n {xi) - fa (b) 


/ Eb fa (b) ) over the 1000 replications. Moreover it is also 
e investigate for what values b the estimate of the surrogate density is better, 
by using the absolute percentage error (APE = (x^) — fa (b) /fa (b) )■ 

In the experiment we take a distributed as: 1) standard Gaussian (that is, a ~ 
AA(0,1)), 2) a standardized Student t with 5 df (that is, a ~ t (5) /y^ 5/3), 3) a stan¬ 
dardized Chi-square distribution with 8 df (that is, a ~ (x^ (8) - 8) /4). About 6, we 
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n 

AT (0,1) 
Mean Std. 

t (5) /y5/3 
Mean Std. 

(8) - 8) /4 
Mean Std. 

50 

100 

200 

500 

1000 

3.235 (2.681) 

1.860 (1.444) 

1.091 (0.824) 

0.546 (0.355) 

0.330 (0.220) 

5.921 (2.557) 

4.775 (1.503) 

4.138 (0.878) 

3.737 (0.477) 

3.606 (0.327) 

4.081 (2.842) 

2.401 (1.619) 

1.422 (0.887) 

0.753 (0.443) 

0.453 (0.233) 


Table 1: Mean and standard deviation of RMSEP (Results xlO for Gaussian, t and 
distributions, computed over 1000 Monte Carlo replications varying the sample size n. 


I 

$ 

I 

I 

I S 
8 

i ^ 

I 

i 


Figure 1: Absolute percentage errors in estimating fa {b) varying b for Normal, t and 
distributions respectively. 

used sequences consisting of 160 equispaced points, over the interval [—4,4] for the 
distributions 1) and 2), and [—2,6] for the asymmetric distribution 3). 

The MSEP (xl0“^) obtained under the different experimental conditions are col¬ 
lected in Tabled! As expected, results improve as the sample size increases: that is due 
both to the better estimates of projections 9 and and to the better performances of 
the kernel estimator. On the other hand, differences due to the shape of distributions 
occur: long tails and asymmetries produce a deterioration in estimates. 

The APE for some selected values b when n = 200 are reproduced in Eiguredl As 
one can expect, the quality of estimate worsens at the edges of the distributions, when 
b is rather far from zero. This fact is connected to the limitations of kernel density 
estimator in evaluate the tails of distributions. 

5.2 Infinite dimensional setting 

In this second experiment, we deal with an infinite dimensional setting in order to study 
how the estimation of the intensity of the SmBP behaves according to the sample size 
and the dimensional parameter d. To do this, keeping in mind ID.11 let us consider a 
Wiener process X on [0,1] and the smooth function x {t) = J2j=i bjij {t), with, for the 
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n 

d=l 

d = 2 

d = 3 

d = 4 

d = 5 

d = Q 

50 

100 

200 

500 

1000 

3.361 (2.506) 
1.948 (1.198) 
1.158 (0.719) 
0.574 (0.334) 
0.345 (0.187) 

7.197 (3.728) 
4.821 (2.585) 
3.143 (1.601) 
1.775 (0.932) 
1.149 (0.625) 

13.529 (7.338) 
9.471 (5.535) 
6.638 (3.773) 
4.168 (2.363) 
2.822 (1.636) 

22.029 (12.052) 
15.857 (8.711) 
11.514 (6.295) 
7.766 (4.232) 
5.858 (3.126) 

31.900 (15.871) 
23.988 (12.272) 
17.894 (9.476) 
12.956 (6.884) 
10.091 (5.425) 

42.051 (19.159) 
33.729 (15.871) 
25.514 (13.098) 
18.988 (9.284) 
15.292 (7.651) 


Table 2: Mean and standard deviation (in parentheses) of RMSEP (Results xlO“^) for 
Wiener process, computed over 1000 Monte Carlo replications varying the sample size n and 
the dimension d. 


sake of simplicity, do = 1, that is 

® (*) = sin ^ , t € [0,1] , (24) 

where 6 € M, so that the intensity is 'hrfp(x) = exp 6^/2|. 

The experiment follows a similar route as in 15.11 In a first step, we generated 
1000 samples {W (t), i = 1,..., n}, (with n = 50,100,200, 500,1000) where every curve 
is discretized over 100 equispaced points Q = {tj = {j — 1) /99, j = 1,... , 100}, and 
160 fixed curves {t) generated according to (|M)) and discretized over Q {b is an 
increasing sequence of equispaced points, over the interval [—4,4]). In a second step, 
for each sample, once empirical eigenfunctions (t) are obtained, we estimated fd 
(with d = 1,..., 6) and we computed them at (x^j ■ ■ ■ > x^dY where = (^x^ (t ), (t)^. 

Finally, we compared the estimated surrogate density with the true one in term of 
relative mean square prediction error (MSEP) over the 1000 replications. The obtained 
results, varying n and d are reported in Table [2j 

As a general comment, one can observe that for each d the MSPE reduces (both 
in mean and in variability) increasing n, whereas, for each n, the MSPE increases 
(both in mean and in variability) with d. This shows how the curse of dimensionality 
interferes in the kernel estimation procedure as soon as the dimension d exceeds one. 
To perceive the relation between d and n, one has to read the table in diagonal sense: 
it is possible to use large d at the cost of use large samples. For instance, similar very 
good results (at around 3%) are possible using n = 50 and d = 1, or n = 200 and d = 2, 
or when n = 1000 and d = 3. On the other hand, results benefit from the fact that 
the spectrum of the process is rather concentrate. In fact, the Fraction of Explained 
Variance (defined as FEV(d) = FEV(l) = 0.811, FEV(2) = 

0.901, FEV(3) = 0.933, FEV(4) = 0.950, FEV(5) = 0.960 and FEV(6) = 0.966. Hence, 
good estimates for the surrogate density are already possible with d = 1 or d = 2, also 
for medium size samples. In that sense, this experiment gives an empirical evidence on 
the use of FEV in select the dimensional parameter d, conscious of the fact that large 
d need large n in order to have better estimates. 


6 Proofs 


This section collects proofs of results exposed above. 
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6.1 Proof of Theorem [T] 

We are interested in the asymptotic behaviour, whenever e tends to zero, of the SmBP 
of the process X, that is 

^p{x,e) = P (||X - x|| < e) = P (\\X - xf < 

( +0O \ / +00 \ 

Y2^{X-x,^jf <eM =P ( 

Let Si = Yl,j<d ~ ^ ~ ^ Sj>d+i ~ truncated series and 

the scaled version of the remainder respectively. Thus, the SmBP is 

(fix, e)=F (5i + £^5 < £2) = P [Si < £^ (1 - 5)) 

= P({5i <e‘^il-S)}n{S> 1}) +P({Si <e‘^il-S)}n{0<S < 1}) 

= P {{Si < £2 (1 - 5)} n {0 < 5 < 1 }) 

if{s\x,e,d)dG{s) (25) 

where G is the cumulative distribution function of S. At first, for any s G (0,1), 
let us consider ipis\x,e,d), that is the SmBP about II^x of the process fl^X in the 
space spanned by In terms of fd (•)) the probability density function of = 

(di,..., ^?rf)^ it can be written as 

ip{s\x,s,d)= [ fdi'd)d-d, 

Jd^ 

where D = = |i9 G : Ylj<d (1 — s)| is a d-dimensional ball 

centered about II^x = [xi,..., Xd) with radius e^/l — s. Now, consider the Taylor 
expansion of / = /^ about Yix = II^x, 

/(-I?) =/(xi,.. .,Xd) + - Bx, V/(xi,.. .,Xd)) 

+ ^{^- HxYHf (Bx + (i? - Bx)t) (i? - Bx), 

for some t G (0,1) and with Hf denoting the Bessian matrix of /. (In general, t 
depends on i? — Bx, but we are not interested in the actual value of it because the 
boundedness of the second derivatives of / allows us to drop, in what follows, those 
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terms depending on t). Then we can write 


(/?(s|x, e,d) = j ... ,Xd) + i'd - Ux,Vf{xi, Xd)) 

- nx)'Hf (nx + (i? - nx)t) - nx)^ d-o 
=f{xi,...,xd)[ d'd+ [ (i9 - IIx, V/(xi,... ,Xd)) 

JD JD 

+ 1 [ i'd- UxYHf (nx + - nx)t) (i? - Ux)d'd 

=/(xi,...,Xd)/ + i [ {'d-UxyHf{Ux + {'d-Ux)t){-d-Ux)d^ (26) 

2 Jd 

where I = I {s,e,d) denotes the volume of D that is 


I = 


^d^dl2 

r {d /2 + 1 ) 


(1-S 


.d/2 


and, the addend (r9 — IIx, V/(xi,..., Xd)) d-d is null since (i9 — IIx, V/(xi,..., x^)) 
is a linear functional integrated over the symmetric ~ with respect to the center 
(xi,...,Xrf) - domain D. Thus from (1261) . thanks to: the boundedness of second 
derivatives ([5]), the fact that symmetry arguments lead to fjji'di — Xi)('dj — Xj)d'd = 0 
for i j and monotonicity of eigenvalues, it follows 


|(/)(s|x, e, d) - /(xi,..., Xd)I\ = 


Note that 




i<dj<d 


divid'd j 


(IIx + (i9 — nx)t) did 


< -^C2f{xi, ...,Xd) 


f ~ Xi){idj - Xj) 


i<dj<d ' 


^C'2/(xi,...,Xd) f 


{dj - Xjf 


^j<d 


■dd 


< ^f(.xi,...,Xd) j ^(r9j 

j<d 


Yi^j - Xjfdd = 


^ 3<d 




whose integrand is a radial function (i.e. a map —>• M such that H(id) = /i(||i?||]gd) 

with h : M —)• M), for which the following identity applies 


[ H{id)d'd = uid-i [ h{p)p'^ ^dp, 

Jo 
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where oj^-i denotes the surface area of the sphere of radius 1 in Hence 


id'd = 


27r‘^/2 

r(d/2) io 


L 


£x/T=5 


P^+^dp = 


d 


{d + 2) 


Ie^{\ — s) < le^, 


where the latter inequality follows from the fact that 5 E [0,1). This leads to 


E^I 

\ip{s\x, e, d) - f{xi,Xd)I\ < ...,Xd). 


(27) 


Come back to the SmBP ([25|), 

(p{x,£)=f f{xi,...,Xd)IdG{s)+ f {tp{s\x,£,d) - f{xi,...,Xd)I)dG{s), (28) 

Jo Jq 

and note that, thanks to (|?7l) and because d is hxed, the second addend in the right- 
hand side of (1281) is inhnitesimal with respect to the hrst addend 


fo d) - f(xi,Xd)I) dG (s) 


< 


lo fi^i,.. .,Xd)IdG{s) 
C2^/(xi,...,x,)/;/dG(s) 


< 


f{xi,...,Xd) Jq IdG{s) 




Noting that 


we obtain 

/ I{s,£,d)dG{s] 
Jo 

2 


where. 

\(p{x,£) 

E 

-(Pd{x,£)\ <C2 — ipd{x,£) 

m 

‘Pdix,£) = f{xi,.. 

^d^dl2 


(0 


Thus, since d is fixed, as e tends to zero. 


(p{x,e)= [ ip{s\x,e,d)dG {s) =ipd{x,£) + o ( 

Jo \f[Xl,. .. ,Xd) 

or, equivalently, ip{x,£) ~ ipd{x,e) that concludes the proof. 


6.2 Proofs of Proposition [2], and theorems [3] and [4] 

To prove Proposition [2] we need the following Lemma. 
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Lemma 6 Assume (A-1) and (A-2). Then, it is possible to choose d = d{e) so that it 


diverges to infinity as e tends to zero and 


E (29) 

j>d+l 


Moreover, as e 0, S{x,e,d) —)> 0, where the convergence holds almost surely, in the 
norm and hence in probability. 


Proof. A possible choice for d = d{e) satisfying ([2^ can be, for a fixed 5 > 0, as 
follow 

d = min < A: G N : Xj < > , for any e > 0. 

[ j>k+l ) 

Such a minimum is well defined since eigenvalues series is convergent. 

Let us prove that S converges to zero in probability. For any /c > 0, by Markov 
inequality and, thanks to Equation (l3|), 


P(|S| > k) = F{S > k) 


E 


j>d+l 



< 


E 




A;2 


< 


Cl Ylj>d+i 


(30) 


Thanks to (f29]l we get the convergence in probability. Since S = S{x,s,d) is non¬ 
increasing when d increases. 


P sup |S'(x,e, j) — 0| > A = P(S'(x,e,d + 1) > A) 

/ 

holds for any k > 0 and any x. This fact, togethe r with ()30p . guarantees the almost 
sure convergence of S to zero (e.g. IShirvavevi . Il984l . Theorem 10.3.1) as e tends to zero. 
Moreover, the monotone convergence theorem guarantees the convergence. ■ 
Proof of Proposition [21 Note that if d(e) satisfies (ITTIl . then ([2^ and Lemma ID 
hold. For a fixed d > 0, a possible choice of such d = d{e) can be 


d = min < /c G N : k Xj < 


.2+(5 




where the minimum is achieved thanks to the eigenvalues hyperbolic decay assumption. 
At this stage, note that 


0 < E 


(1-5)''/2I{5<1} 


< 1 
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then, after some algebra, thanks to Bernoulli inequality (i.e. (l + s)*' > 1+rs for s > —1 
and r G M \ (0,1)), Markov inequality and Assumption (l3|), we have (for any d> 2 ) 


<P(5' > 1) +E 


< 1 -E 


1 - 2 ^ ) 


2'S'I{S<i} 

< E 




[ j>d+l J 


< 


Ciid + 2) 




2^2 r 

j>d+l 


Choosing d according to m the thesis follows. ■ 

Proof of Theorem [31 Thanks to hyper-exponentiality (1151) . there exists do S N so 
that for any d > do 

d 'y ^ Aj < Xd- 

j>d+l 

Moreover, there exist 61,62 G (0,1) (depending on d) for which, for any d> do 


0 <d ^ Xj < b{d,{Xj}j>d+i,6i) < B{d,{Xj}j<d, 62 ) < Xd, (31) 

j>d+l 

where 

Kd,{X,}j>d+i,6i)= Id Y, > Bid,{Xj},<d, 62 ) = xY^. 

\ j>d+l / 

As instance, for a given d > do, fix 61 £ (0,1) and solve (l3T]l with respect to 62 , that 
is 62 G (min{0,/3(di)} , 1) where /3((5i) = 1 - (1 - (5i)ln Xj^ /In(Arf). As a 

consequence, for any e > 0 and for such a choice of di, 62 , the following minimum is 
well-defined 


d{e) = min {A: G N : b{k,{Xj}j>k+i, 6 i) < < B{k,{Xj}j<k, 62 )} ■ 

This guarantees that the right-hand side of (fT3|l vanishes as e goes to zero. 
To prove Theorem [H we need the following Lemma. 

Lemma 7 Assume (A-1 )\ and \(A-^ Then, as e ^ f), 

TZ{x, e, 1, or, log (TZ{x, e, d)) = o{d). 


(32) 


Proof. Jensen inequality for concave functions (i.e. K[f(g)] < f(E[g]) if / is a concave 
function) guarantees that 


E 


{{i-s)\s<i}y 


= E 


< E 


d+1 d 

{{1 - S)I{s<i}) ^ 


((1 - S)I{s<i}) 


d+1 

2 


d 

d+1 
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noting that S{x,e,d+ 1) =: Sd+i < Sd := S{x,£,d) and then 


E [((1 - Sd)Iis,<i() 

The latter guarantees that E 


< |E 

.d/2 


((1 - Sd+i)l{sa+i<i}) 

1 


d+l 

2 


d+1 


is a non-decreasing monotone se- 


quence with respect to d whose values are in (0,1] and eventually bounded away from 
zero. ■ 

Proo f of Theorem!^ Given results in Theorem[Tl thesis holds using same arguments 
as in (jPelaigle and Halil . l2010l . Proof of Theorem 4.2.); the idea is to combine together 
(j32p . the Stirling expansion of the Gamma function in Vd and the (super-)exponential 
eigenvalues decay. ■ 

6.3 Proof of Theorem [5] 

In what follows, as did in Section UJ we simplify the notations dropping the dependence 
on d for the density estimators fn and /„. Moreover, C den otes a general positiv e 
constant. The proof of Theorem [5] uses similar arguments as in iBiau and MasI (j2012l b 
Since Hn = it holds Kh^ (u) = h~^K (u). Gonsider 


i=l ^ 


Sn{x) = ^K 


lid - x) 


i=l 


hr. 


then the pseudo-estimator and the estimator are given by 


fn (x) = 


Sn (x) 
nhi 


fn (x) = 


Sn {x) 
nhi ’ 


and, hence. 


E 

fn (x) - fn (x) 

2 1 

= - 5'E 

Sn (^) Syi (t) 



{nhif 



, consider the events 


SetVi = \\UdiX,-x)lVi= Ud{Xi-x) 

Ai = {Vi < hn} , Bi = < hnj , 

then we have the decomposition 


Sn {x) - Sn (x) 


2=1 _ 
n 




hr 


Vi 


K ^ - K \ ^ 


hr 


L 




2 = 1 


2=1 
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Since (a + 6)^ < 2 a^ + 26^, 


E 


Sn (x) - Sn (x) 


1 2 


<2E 


_2 = 1 




Vi 


1 2 


+ 2E 


\i=l 


V,, 


+ 




^ 2=1 


(33) 


Consider now the first addend in the right-hand side of (j33l) : Assumption (B-3) and the 


fact that 
lead to 


E 


E hf 


, 2=1 


Vi-V^ 

Vi 

hn. 


< 


Hd - Ila 


\Xi — x||, where IHI^ denotes the operator norm, 


-K 


V 

hrt. 


1 2 


AinSi 


< CE 


1 2 


fid - lid 




2=1 


Thanks to the Cauchy-Schwartz inequality we control the previous bound by 


CE 


r 

2 ■ 



Bd - Hd 

oo 

E 



(34) 


About the hrst factor in (|34l) . Biau and Masl . 20121 . Theorem 2.1 (ii) established that 


= O 


(35) 


Consider now the second term in (13411 . Th anks to the Chebyshev’s algebraic inequality 
(see, for instance, iMitrinovic et al.l . Il993l . page 243) and since E [lAinfiJ < IE [IaJ, for 
any A: > 1 it holds 


E 


IA - x\\ lAinBi 


< E 


|A-x| 




The fact that E [I^J ~ and Assumption (B-4) give 


E 


IA - x\\ lAiHBi 




with b > 0. Hence, the Bernstein inequality (see e.g. iMassartl . 120071 ) can be applied: 
for any M > 0, 


'^WXi - x\\lA.^Bi -E 


2=1 




_2 = 1 


> Mnh'^ j < exp . 


This result together with the Borel-Cantelli lemma lead to: 

n 

J2\\Xi-x\\lA^,B,<Cnh'^ 


a.s. 


i=l 
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and therefore 


E 


^ ||Xi - x|| I 




V. 2 = 1 


< Cr?h‘^'^. 


(36) 


Finally, combining results (1351) and (|36l) . we obtain: 


{nhiY 


rE 


2 = 1 








hrt 


< C 


nhl' 


(37) 


Consider now the second addend in the right-hand side of p3]i . We only look at 

2 


E 


. 2=1 




I 


hn J 


(38) 


because the behaviour of the other addend is similar. Define the sequence so that 
0 as n —)• oo, the following inclusions hold: 

Ai^Bi = {Vi < hn} n \Vi > hn] 

= {{hn (1 - Kn) <Vi< hn} U {V < (1 - K„)}) C ]Vi - Vi > hn - Vi] 

C {hn (1 - Kn) <Vi < hn} U ]Vi < hn {1 - Kn) ,Vi - Vi > hn - Vi] 

^ {hn (1 ^n) V Vi ^ hn} U Vi V . 


The latter inclusion and Assumption (B-3) allow to control (|38]l by 



n 

2 

E 

X] ^Air\Bi 
.2=1 

< 2E 


1 2 


^{hn{l-K-n)<Vi<hn} 

_2 = 1 


+ 2E 


1 2 


El 

. 2=1 


{||nd-nd|| ||Xi-x||>CKn/Jn} 


(39) 

About the first term in the right-hand side of the latter, the Cauchy-Schwartz inequal¬ 
ity gives 


E 


El 

. 2=1 




< n^P {hn {1- Kn) <V < hn) ■ 


Since P {hn (1 — k„) <V< hn) ~ ^1 — (1 — Kn)^] , performing a first order Taylor 

expansion of (1 — Kn)'^ in Kn = 0, we get asymptotically 


E 


^{hn{i-l<.n)<Vi<h„} 

2=1 


< CrYh^Kn- 


Similarly, for what concerns the other addend in the right-hand side of (I39p . we have 

IA — x\\ > CKnhn 



n 

2 


E 

-2 = 1 

< n^P ^ 

Hd - Ud 
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Thanks to the Markov inequality, Biau and Mas . 20121 . Theorem 2.1 (hi) and Assump¬ 
tion (B-4), it follows 


P 




lx — x|| > CKnhri I = O 




Combining the previous results we obtain: 


{nhiY 


rE 


. \ 2=1 


1^. 




n 2 


K,n 




Kr, 


If we choose Kn 


(n5/2/iM)-V2 


oo, as n —)• oo, we obtain: 


E 


\i=l 




hr, 


\ i=l 


Vi 

hr, 


< C 


1 


j^b/Ah^d 


(40) 


In conclusion, (l371) and (HOl) lead to: 
1 


{nhiY 


tE 


Sn (x) - Sn (x) 


1 2 


= o 


nKi 


+ 0 


1 


Choose the optimal bandwidth (I23p and p > 2 V 3d/10, then, as n goes to infin¬ 
ity, the first addend becomes negligible compared to the second one that turns to 
be O ^ Moreover, a direct computation shows that such bound is 

dehnitively negligible when compared to the “optimal bound” for any 

p > 2\/ 2>dj2 and d > 1. This concludes the proof. 
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